[SPARK-32106][SQL] Implement script transform in sql/core#29414
[SPARK-32106][SQL] Implement script transform in sql/core#29414AngersZhuuuu wants to merge 16 commits into
Conversation
|
FYI @maropu @cloud-fan |
|
Test build #127370 has finished for PR 29414 at commit
|
|
Test build #127373 has finished for PR 29414 at commit
|
| FIELDS TERMINATED BY '|' | ||
| LINES TERMINATED BY '\n' | ||
| NULL DEFINED AS 'NULL' | ||
| USING 'cat' AS (a, b, c, d) |
There was a problem hiding this comment.
Can ROW FORMAT DELIMETED with output schema work correctly?
There was a problem hiding this comment.
Also, could you add test cases for the parser, too? https://github.com/apache/spark/pull/29414/files#diff-36e2b29ae675caaa1fce16e74fbd8710R1135
There was a problem hiding this comment.
see two more bug founded
https://issues.apache.org/jira/browse/SPARK-32607
https://issues.apache.org/jira/browse/SPARK-32608
There was a problem hiding this comment.
Also, could you add test cases for the parser, too? https://github.com/apache/spark/pull/29414/files#diff-36e2b29ae675caaa1fce16e74fbd8710R1135
Add some UT in transform.sql and fix origin wrong part
|
Test build #127402 has finished for PR 29414 at commit
|
|
Test build #127406 has finished for PR 29414 at commit
|
|
Test build #127829 has finished for PR 29414 at commit
|
|
retest this please |
|
Test build #127834 has finished for PR 29414 at commit
|
|
retest this please |
|
Test build #127841 has finished for PR 29414 at commit
|
|
retest this please |
|
Test build #128204 has finished for PR 29414 at commit
|
|
retest this please |
|
Test build #128205 has finished for PR 29414 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132769 has finished for PR 29414 at commit
|
|
I am good with this change. I don't mind if somebody merges. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132794 has finished for PR 29414 at commit
|
|
Hope merge this and start next work |
|
gentle ping @maropu @cloud-fan |
|
gentle ping @tejasapatil @sameeragarwal |
|
okay, I will merge this into master in a few days if nobody has more comments. |
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132973 has finished for PR 29414 at commit
|
|
retest this please |
|
Test build #132972 has finished for PR 29414 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #132980 has finished for PR 29414 at commit
|
|
okay, I believe we have much time to revisit this until the Spark v3.2.0 released, so I'll merge this into master for now so as to encourage @AngersZhuuuu 's following work. FYI: @HyukjinKwon @cloud-fan @dongjoon-hyun @gatorsmile |
|
Merged to master. |
|
Nice, I support your decision @maropu. |
|
Thank you for informing, @maropu ! |
| struct<> | ||
| -- !query output | ||
| org.apache.spark.SparkException | ||
| Subprocess exited with status 127. Error: /bin/bash: some_non_existent_command: command not found |
There was a problem hiding this comment.
This seems to cause a test flakiness in GitHub Action. Could you take a look, @AngersZhuuuu and @maropu .
SQLQueryTestSuite.transform.sql
org.scalatest.exceptions.TestFailedException: transform.sql
Expected "...istent_command: comm[and not found]", but got "...istent_command: comm[]" Result did not match for query #2
SELECT TRANSFORM(a)
USING 'some_non_existent_command' AS (a)
FROM t
There was a problem hiding this comment.
This is very flaky. Almost 50% failure probability.
There was a problem hiding this comment.
Ah, that's too bad. Thanks for letting me know. I'll open a followup PR to fix it.
There was a problem hiding this comment.
Thank you so much for taking a look at that, @maropu .
What changes were proposed in this pull request?
SparkScriptTransformationExecbased onBaseScriptTransformationExecSparkScriptTransformationWriterThreadbased onBaseScriptTransformationWriterThreadof writing dataSparkScriptsto support convert script LogicalPlan to SparkPlan in Spark SQL (without hive mode)SparkScriptTransformationSuitetest spark spec caseSQLQueryTestSuiteAnd we will close #29085 .
Why are the changes needed?
Support user use Script Transform without Hive
Does this PR introduce any user-facing change?
User can use Script Transformation without hive in no serde mode.
Such as :
**default no serde **
no serde with spec ROW FORMAT DELIMITED
How was this patch tested?
Added UT